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ABSTRACT 



; study 
texts 



^ ^ ; Analyzing word frequency in six complete texts, a 
i n yes t i gated how vocabulary can be used to define texts . The 

— ^inc 5th and 6th grade readers, t 

selections: fro literature anthologies for 8th grade and 12th grade 
students , and a magazine essay for adults . Results indicated that if 
liparticularvwords occur frequently in a text > they do so because the 
language requires it .Few w to occur in all of the 

texts ; those that did were almost all function words , "be" forms , or 
pronouns. Analyses of word frequency lists revealed that some content 
words occur red more than others s imply because they referred to) 
common concepts .Overall , f indings suggested that the wording of 
text is not random and that', in fact, texts "self-control" their 
vocabulary. These results suggested that. (1) readers can build 
dependable strategies for dealing with words conynon in a text but 
gless ^ommon in the -language as a whole ;( 2 ) authors and editors 
should concentrate on relating the content of texts to the audience 
rather than focus on controlling vocabulary through the use of word 
•lists; and ( 3 ) teachers concerned with vocabulary development should 
focus on functional use of words in the context of real texts rather 
than resort to decontextual i za t i on lists or dictionary exercises 
(JD) ■•; ■ •'• ' • - 
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Statement of Purpose 



This series of working papers will provide a report of our current 
thinking and make available the work of our program to those who may be 
interested* It is our intent to stimulate an on- going dialogue with other 
professionals who share similar interests in educational theory and practice 
We welcome responses from readers- Comments may be directed to the author I 
of the paper or to the directors of the program. 

"■■ Some but not all of the papers may appear in other publications in 
modified form. We are making this publication available at cost* . '. 



Word Frequency; in Texts and In General 

The topic of this research report is really the word- 
ing of tents; Hal 1 iday considers "wording" the folk term 
for what he describes as the lex ieo-grammar of the language 
(Halliday, 19Si>. fls such the wording is both the final 
written representation of the meaning and the process by 
which the final selection is made. Which words constitute 
the visible tent is certainly determined by the writer but 
only within strong lexico-grammat ical constraints that the 
structure of language* meaning and social communication 
provide- . : - \, • 

But in the field of reading and in the history of 
reading research the focus on the wording of language has 
concentrated on WORD FREQUENCY, the frequency with which 
words occur -in general in the language. 

As with so faBny popular not ions in education, study 
of the issue of word frequency is dominated by the original 
reasons it was considered important and we have not looked 
object i vely at the real i t ies of word ing as the result of 
text character i st i cs - Unless we exam ine how character ist ics 
of coherent, functional, meaningful tents relate to choice 
and frequency of words in those texts we cannot truly 
understand the si gnif ieance of relative frequency of words 
i n use- So, to put our st udy i n an ed ucat i ona 1 cont ext and 
to examine current educational belief and practice, we are 
making word frequency WITHIN TEXTS the focus of this paper- 
Purposes for Studies of Word Frequency 

. = KK-' 1 : ■ . ' Page ! '^M r. '. : ; .: -0:;.' i /J' : r' ; - . ; ,v_ v 



Word \ Freq uency 

One of the most deeply rooted not ions among teachers of 
reading is that controlled vocabulary, based on studies of 
relative word frequency, is necessary in instruct Von*i^vffl*^ 
terials for developing readers. This emphasis on using 
word frequency 1 ists to bui Id control led vocabulary mater— 
ials was not the original purpose for constructing such 
1 ists, ' however* ■•• ^■■^■•■■■i l '"y*~.* t* -»...■. 

Counts of word frequency wtri first compiled as a 
means of determining the readability of existing tents. 
The researchers were operating on the premise that words 
which occur most frequently in the language are more easily 
recognized, learned, and processed in reading than less 
common words. So it seemed reasonable that the more high 
frequency gmords a tent contains proportionate to low fre^ 
quency words the easier it would be to read. 

While using word frequency as a means of DETERMINING 
readability in existing texts may seem logical, there is a 
gap in the logic of manipulating word frequency to CONTROL 
readability, by restricting the vocabulary of new texts or 
•rewrit ing old texts by subst itut ing high frequency words 
for low f quency words, Research has demonstrated a mod- 
erate correlation between the proportion of unusual words 
and text difficulty even if it has also shown that th it 
correlat ion is insufficient to determine readabi 1 ity. Most 
readabi I ity formulas , st i 11 include some measures of unusual 
words. But art if icial ly reducing vocabulary to "' create 



Word Frequency 

tents with 'Vmm' ■ ■•':f^ i ^e^oH#i^«^ •'- of "' uncommon word© tam^ri wi# 
th© very fac^Srr© thai iri^y Contribute both to word frequency 
and tent diffft^ ^ty. are good rtiions why particular 

words ©ccus^ in pmri rieuisafir places in particular tents and 
author* s G^fti## I* ' : ^*^y ■ . ■ one of theme reamonm. Tamper ing 
with the ''^i^m^y^. tents without understanding why words 

occur in tdft^S*' t*r&.>way they do may make texts less readable 
rather tha^ fer# y " 

Over vfe he years a mystique has grown up around the 
significance of control ling vocabulary to control coropr e~ 
hensibil ity. • . • VJ% ; - "\\ : --y>_.- .* ?F 

R major reason for accepting the importance of con- 
trol 1 ing vocabulary has been that many teachers and 
searchers have operated from a view of reading as get t ing 
words and of learning to read as learning to get words*. 

John Carrol l f author of a comparatively recent study 
that used a base of 5, 000, 00® words, argues that the size 
of the reader 1 s vocabulary is the most important causative 
f act or in comprehensi on* (Carrol 1 f 1981 ) 

Ironically the methodology of counting word frequency 
has itself contributed to misconceived applications, 
; ' J Early frequency studies showed that words vary so much 
in frequency from one content to another that ; it's 
necessary to examine a very large corpus of language to be 
able to support the assert ion that the frequency found is 
representative of the whole language. Later studies were 
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based on awareness that the corpus must be a broader* 
representation than a mingle source such as the Bible. 
Such grand scale studies may or may not provide a list 
truly representative of "all" language. But such a large 
corpus of language containing many texts eliminates thm 
very factors that constrain the choice and frequency of 
vocabulary in a single coherent tent. So when the 
frequency list is used to construct or rewrite tents it may 
make them strange and unpredictable. When the list is used 
to judge readability of a tent it imposes the assumption 
that words are of equal d i f f icul ty regard 1 ess of where they 
occur and what kind of context ual support is provided* 
Word Frequency at a Feature of Text 

In this study we have put our focus on what grand 
seal m word frequency stud its could not shed light on: what 
does word frequency mean in the context of a sing le cohere 
ent and cohesive text not written or adapted for the pur- 
pose of the study? We want to know what it is about lan- 
guage in use th at produces var i able word freq ueney. Such 
knowledge wi 1 1 help put the issue of vocabulary in its 
prosper context Cno pun intended) - • But it will also help to 
define a text in terms of its use of vocabulary. This will 
provide knowledge of the relative importance of any par- 
ticular word to the text and text comprehension- It wi 1 1 
also suggest how vocabulary is developed through reading. 

John Carroll (1981) reasons that since good compre- 

Page 4 : ! Z : '^^r , ' : '- 
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hension correlates with good vocabulary, then vocabulary 
development is esment ill to compreherision. ft: more logical 
cone 1 us i on is that people who read a lot develop large vo- 
cabularies. So an important correlary question im f "what 
is there about word use and frequency in texts that builds 
vocabulary during reading?"* V 

Our study is rooted in a psycho! inguistic theory of 
reading and it draws on data from past miscu© studies 
(Soodman and Burke* 1973* Goodman and Goodman, 1976) • 

In this : itudy wt f vi examined word frequency in sin 
complete tents which have been used in miscue research. 
Our purpose is to determine not only the frequency with 
wh i ch words occur in each t ext but a 1 so to determine why. 
We are seeking to understand how the vocabulary of the text 
relates to its other characteristics, and what constraints 
a complete text imposes on its vocabulary. Such under- 
standing may call int>* question the use of word frequency 
lists in judging readability and in struct uring control led 
vocabulary basal readers, , 
ft Historical Summary of Word Frequency Research 

: - Over a long period research efforts have centered on 
proving which word frequency list readability formula 

was the most effective and why. The word frequency vari- 
ables selected to measure text difficulty were numerous: 
number of running words, percentage of different words> 
percentage of different infrequent, uncommon, or "hard " 
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wordS| percintigi of polysyl labic wo^d S| vocabu 1 try id if - 

f iculty, vocabulary diversity, number of abstract words, 

number of affined morphemes, and so on* (Lorge, p. 82, 

Vocabulary control is not a new idea* Lorge traces 
word and idea counts back to the Talmudists in 900 A. D- who 
used frequency of occurrence to dist inguish usual from un- 
usual meanings. Nor is interest in word frequency compa^ 
ratively recent in the United States? in 184® the McSuffey 
Readers were claimed to contain words careful ly selected 
for !' ease of understanding! " though criteria for selection 
were not made explicit. > : -7* : ^<W^B 

N. A, Rubakin and F- W. Kaeding compiled word lists; in; 
1 889 and 1898 respectively- Kaeding set the precedent for 
using actual word counts to produce 1 ists of words in order 
of frequency of occurrence, (Lorge, 1944) - - 

Whi let he early word counts all produced frequency 
lists they varied considerably in terms of the language 
sample from which they were drawn. Relying primarily on 
Bible 'passages, Knowles produced a three hundred and fifty 
word basic vocabulary for the blind. Eldridge* s < 191 1 > 
list of SIX THOUSAND COMMON ENSL I SH WORDS, was drawn from 
four issues of the Buffalo, New York Sunday papers dated 
July and August 19@9, <K1 are, 1963) 

Ernest Horn' s A BASIC WRITING VOCABULARY < 1925) , a 
1 ist of approH imately f i ve - mil 1 ion words, was based on 
: Page 6 " '. • v," V.V-" h;/ -X: 
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personal arid busi ntss correspondence. Next f Horn tack led 
the job of counting the spoken vocabu 1 ar i es of young ch i 1 — 
dren, ages one to six, and in 1926 published "The Commonest 
Words in the Spoken Vocabulary of Children up to and In— 
eluding Sin Years of flge s V<Horn^l9S6) 

It was Thorndike' s THE TEACHER* S WORD BOOK* published 
in 1921 f however, that heralded the dawn of readability 
f ormu lasi Thornd i ke' s ten thousand most frequent words 
served as the basis for the first significant readabi 1 ity 
formulas and pont ro 1 1 ed vocabu lary readers, ( Thornd i key 1 9S i > ■ 

, Lively and Pressey in 1933, calculated the vocabulary^ 
burden of a book select ing a thousand-word sampl ing, as- 
si gning each word an index of d i f f icult y that corresponded 
with > the Thornd ike list, and then computing the weighted 
median index number for the passage- Thus, the index num- 
bers were based strictly on the frequency of the use of 
words* (Lively and Pressey, 19S3) • 

: Lively and Pressey* s work sparked the interest of 
other researchers incl uding Washburn and Vogel who in 1928 
declared that the number of different words in a thousand 
was the most rel i able indicator of p^ssage-di f f icul ty be- 
cause of its close correl at ion to median read ing scores 
obtained from the paragraph-meaning section of the Stanford 
Achievement Test- (Washburn and Vogel, 19£S) 

William S. Gray, father of the basal reader, cauti on- 
edi " It is reasonable to assume that the number of dif fer- 
■ : '':%$■}?■■ • -v '"..; Page-7--' • . • ^ • -pf .-^ v*- f-'S^ 
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ent words used is a fair* measure of d i f f iculty , because it 
• indicate^ t the range of concepts involved!' It fails, how- 
ever, to consider whether the words used represent re la- 
t ively simple or difficult concepts" (Gray, p. 492, 1947) . 

This is a serious oversight of the early vocabulary 
lists " Not or *ly did they fail to consider the difficulty 
or simplicity of the concepts represented by the words, but 
they overlooked meaning and meaning variation completely* 

J hm v °cabu 1 ary studies that fol lowed in rapid-fire 
succession employed various methods of compiling words. 
< Another 1 9E8 study conducted by Dolch analyzed text- 
book series according to five indices of difficulty: per- 
centage of different wordsf percentage of difficult words 
< using his combined word study list)? degree of difficulty 
of wordsf of difficult words} and degree 

of difficulty for supplementary reading. Difficulty was 
equated with inf requency^* Dolch, 193©) > 

Next was Lewe study in 

which he focumsed on words beginning with w, h, b, i, ©re. 
He reported that words that start with w, h, or b occurred 
with relat ive frequency and could - be classi f ied as easy 
words, wh i le words that beg in wit h i or e were re 1 at i ve 1 y 
f€lw a ? d therefore, 1929) 

Johnson relied on still another indeH of vocabulary 
difficulty he '| reported that the percentage of 

polysyllabic words in a passage is a reliable; indicator of 
■y-'^':%y y: r .[j0»,y.: Page a ' ' :/' %; V B: . - 5 ':V§T ■ 
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the reading difficulty that children will experience. 
< Johnson, 1930 > • : '"'" v ",,, ■ "' r - : '';,.-^/ : r 

One year later | in 1931 , Patty and Painter presented a 
modified version of the Li vely^Presmey method by listing 
all; words located on the third complete 1 in© of each fifth 
page, mu 1 1 i p 1 y i ng the corresponding Thorndike index numbers 
of the selected words by the frequency of the use of the 
respective words, and finally, calculating the 
average-word-weight value by dividing by the total number 
of words in the sample- < Patty and Pointer, 1931 ) ^"^V^^ 
In keeping with the vocabulary analysis tradition, • 
Thornd i ke himself produced st ill another technique in the 1 
thirties based on his own word list- Using a sample of ten 
thousand words from the book to be analyzed, he counted the 
number of words it contained that were in the; various ca- 
tegories of the TEACHER 1 B WORD BOOK, and then calculated 
the norms for each grade (Thorndike, 193E) , ^ : 

Klare characterises this early period of readability 

^ research^'' as - f ol lows : 1 ' '•■ ; ' '«%U !; „. -^y^z .^mj: 

1) primary attention paid to vocabulary (frequency) as a 
basis: for predicting readabi 1 ity § 8) dependence upon 
Thorndike* s TEACHER'S WORD BOOK as the basis for determih- 

ing: yocabulary^ difficulty; 3) use of "relatively crude 

^^.^■..■■■■■■■•■v . . ,..,/.— . , v , : „... ,r ,■ / p . . ^ ^ .-f 

criteria of reading difficulty. 11 (Klare, 1963 p, 44 f ) r j 



Meanwhile, new approaches to vocabulary sampl ing wert 
turning out still more word lists. In 1936, Buckingham and 
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Dolch utilized a free association technique in which 21*000 
children in grades two through eight were asked to write 
all the words which they thought of during a fifteen-minute 
period. The result was a pool of two and a half million 
words which were then tabulated by lexical unit according 
to Thorndike* s procedures. 

In 133B, Rinsland and Moore, after collecting nearly 
six million written words from school children, announced 
their proposal for a list % "to assemble al 1 data and 
words. - - into a consolidated list of approximately 15, ©00 
different words with eight columns of frequencies for the 
eight grades after each word. M <Lorge, p. 547* 1946) 

It was Dale, however, who came closest to fitting the 
bill for a graded word list. He based his work on the 
premise that ninety percent of the children entering fourth 
grade would know some meaning of a vocabulary selected from 
previous word counts including Sates* ft READING VOCABULARY 
FOR THE PRIMARY GRADES, all words from A STUDY OF THE 
VOCABULARY OF CHILDREN BEFORE ENTERING THE FIRST GRADE, and 
the first thousand most frequent words in Tidyman* s A 
SURVEY OF THE WRITING VOCABULARIES OF PUBLIC SCHOOL 
CHILDREN* His testing procedure was quite simple and 
straightforward- He asked almost 0,000 children in grades 
4, 6 and 8 to state whether or not they knew a given word. 
In 1943, Dale published a list of easy words based on the 
study, (Dale 1943) 

Page 10 
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Another major word count was ponduetid by Lorge in an 
effort to obtain an estimate of the frequency of the 
occurrence of words in adult reading material. Drawing 
from five adult magazines with high circulationi SATURDAY 
EVENING POST, LADIES* HOME JOURNAL* WOMAN'S HOME COMPANION, 
TRUE STORY and READERS DIQE5T f Lorge pooled approximately 
five million running words, These were then combined with 
the Thorndike 20,000 Word Book, The Thorndike Juvenile 
Literature Count, and the Lorge^Thorndike Semantic? Count 
and published as THE TEACHER'S WORD BOOK OF 30, ®@© WORDS. 
(Thorndike and Lorge, 1944) 

Lorge himself suggested that word lists can be used 
most effectively in establishing a core vocabulary for 
children. He warned, however, that they "cannot be the ONLY 
basis of select ion. .„," as they fail to account for the 
meanings of words. Even the most frequent words commonly 
have more than one meaning. For example, Lorge says the 
S5@ WORDS OF BASIC ENGLISH represent 12*42^ listed meanings 
in THE OXFORD DICTIONARY with approximately 5, 991 
additional senses that are not separately listed. 
Furthermore, Lorge said, each reader brings his or her own 
background of experience to the text, and the meaning the 
writer intended for a part icular word or passage may not be 
the meaning the reader receives. (Lorge, 1944) 

Everyone seemed to agree right from the start that 

Page 11 
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frequency of occurrence was important due to the common 
sense notion that common words are easier to recognize. So 
the next concern was the nature of the language sample from 
which the words were drawn. Everything from the Bible to 
popular adult magazines to the Buffalo Sunday newspaper 
were used. While some researchers swore by samples 
collected from written language, others < e.g. Ernest Horn) 
insisted that oral language was the best source* Still 
another variable was the age of the subjects from which the 
language was collected* The subjects ranged in age from 
preschool to adult* 

Although there was wide variation in the nature of the 
sample* the basic approach to the col lect ion was the same, 
fl large number of words Nere generated over a range of re— 
lated tents from a particular source* 

The validity of grand— scale word counts was es- 
sentially assumed and never seriously questioned. When 
people began to realize that word counts alone were not 
adequate for readability, at tent ion gradual ly shifted to 
the development of readability formulas that did incorpor- 
ate other criteria besides vocabulary such as complex ver^ 
sus simple sentences, sentence length and qualitative fac- 
tors including obscurity and incoherence in expression. 
But basal tent continued to focus strongly on two fac- 
tors* controlled vocabulary and repeated exposure, a con= 
tribut ion from behavioral psychology. 

Page IS 
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The essential weakness of all this word counting is 
that word frequency is treated as a phenomenon that exists 
independently of the tent in which it occurs. Word fre- 
quency has been treated as a cause of tent difficulty but 
not as a result of characteristics of the text itself. 

Early research established that very large amounts of 
tent must be used to get some sense of the relative 
frequency of words in general- But, as we said earlier, 
using huge bodies of language with millions of running 
words and thousands of different words blots out the cha- 
racteristics of a text which determine the choice of words 
and their frequency. 

Though authors have some choice in the words they use 
in creating a tent, there is always a considerable amount 
of constraint on that choice. Some syntactic features of 
the language are extremely constraining. Common nouns 
particularly in the singular in English almost always re- 
quirt determiners. So THE and ft are going to be very fre- 
quent in all English texts. THE will be more frequent than 
ft because THE has an anaphoric quality; it is used with 
nouns already introduced in the text. Some semantic fea- 
tures of a text serve an essential and repeated purpose. If 
the text contains dialogue, SAID will occur very often. 
This explains why it is often the most common verb in a 
text. ' 

But other semantic constraints derive from the message 
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or meaning being rtpreiented. So a story about a sheep dog 
defending her flock against predatory coyotei will make 
frequent use of tome words not likely to be common in even 
several million words from school texts, several Sunday 
editions of the Buffalo Sunday paper, or many other 
sources* Such frequency doesn't make the tent hard to 
read. 

This is why in our study we've foeussed on a very 
different set of questions. We have examined word 
frequency in the content of connected discourse, looking at 
the choice and frequency of words in relation to other teat 
characteristics. 

We have asked i What is REALLY happening in a text? 

Why are some words in a text more frequent than others? 

How are the words related to each otherf semant ical ly, 

syntactically? How do the words function syntactically? 

Do some words serve more than one syntactic function? How 

does word frequency relate to text cohesion? 

Description of the texts selected for this study 

The texts we have selected include two middle grade 

basal stories, SSI, "Freddy Mi ller Scicmt ist ", < fifth grade) 

and 853, "My Brother is a Genius", <Sth grade). 559, 

"Sheep Dog", is a select ion from an eighth grade literature 
book. 
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56@ f POISON, is a story by Roald Dahl published in un- 
abridged form in a twelfth grade literature anthology. 570, 
"Ghost of the Lagoon, 11 written by Armstrong Sperry^ appears 
in its original form in a sinth grade reader. SSI "Why We 
Need the Generation Sap" is an adult magazine essay. 

Operating from a theory that language controls its own 
vocabulary 9 we have examined word frequency in these tents, 
fill have similar characteristics but are different too, 
depending on the author's purpose and style- We chose these 
particular stories because we have lots of misoue data on 
subjects reading them, allowing us to compare across tents 



with some degree 


of sophist i cat ion. 


We have 


purpose 


avoided 


using beginning reading 


mat er i a 1 as it 


tends 


employ 


rigidly controlled vocabulary, 
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Story 


Running 


Di f f erent 


Used 


Typ/Tok 


% Words 


Number 


Words 


Words 


Once 


Rat io Used once 














331 


1369 


466 


£63 


a. 94 


56. 44 


S53 


£©30 


604 


336 


3. 36 


55. 63 


570 


£775 


809 


457 


3. 43 


36.49 


S59 


3667 


95£ 


507 


3. 85 


S3. 26 


B60 


420© 


383 


499 


4. 77 


56. 51 


SSI 


1318 


608 


459 


£.17 


73,49 















Table One provides some general data about the word 
frequencies in the six stories. Story length in terms of 
total running words (tokens) is proportional to the grade 
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level of the ichoel selections, S51 f the fifth grade 
story f ham only 1369 total running words. The sixth grade 
stories, S33 and S7S5 f have 2@3@ and 2775 respectively. The 
story from an eighth grade tent, S59, has 3667 words while 
860, the adult short story from a 12th grade antholf^ has 
4£0S words*. The magazine essay is shorter with 1316 words. 

The number of different words (types) also increases 
in materials for more advanced readers. So types increase 
from 466 in SSI to 604 in 333 to S©9 in 87© and 932 in S59. 
But in S60 number of types actually is lower. This relates 
to a steady increase in the type/ token ratio for 
successively more advanced stories from less than 3 uses 
per type in SSI to almost 3 in 560. We might expect the 
most effect of vocabulary control and deliberate repeated 
use of vocabulary in the two stories from the Setts 
readers. In fact the rising type-token ratio suggests that 
in less consciously controlled narrative material some 
words occur very frequently* in fact more frequently - than 
in more controlled tents. 

S61 f the magazine essay, shows a very different pat- 
tern however with 60S types for 1318 tokens and a ratio of 
only 2. 17. This probably represents the difference between 
this non-narrative tent and the others which are all 
narrative. 

We must examine another aspect of word frequency in 
these texts to get a more complete picture. In every word 
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study, no matter how large the corpus, many words will be 
found to occur only once. In all of these stories more than 
half the tokens occurred only once, In fact all of the five 
narrative stories show similar per-eents of word types used 
only once, 53.2634 to 5S.51H, It's not surprising that 
better than 7554 of the type's in 861 , the essay, occur only 
once considering the low type/token ratio. One dimension of 
the wording ©f these narratives, then, is that more than 
half of the types occur only once. But an opposite dimen- 
sion is that a few words occur extremely often. 

Table Twos Words Representing % of Total Tokens 



Stories Total 


Biff. Cumulative Percent of 


Running 


Words 




Words 


Words 






Tokens 


Types 1054 £054 3054 


4054 


5054 












SSI 


Word* 1369 


466 3 6 16 


31 


S3 




54 Types 


0,64 1.72 3.43 


6,65 


11. 37 












S53 


Words £030 


645 3 7 14 


26 


52 




% Types 


0.47 1.09 2.17 


4. 03 


8. 06 












S70 


Words 2775 


809 2 6 13 


30 


64 




54 Types 


0. 25 0.74 1,61 


3.71 


7. 91 












S59 


Word* 3667 


932 1 5 11 


27 


64 




54 Types 


0, 1 1 0. S3 1. 16 


2,84 


6. 72 












860 


Words 4208 


883 £ 6 12 


£7 


sa 




54 Types 


0. 23 0. 68 1. 36 


3. 06 


6. 57 












SSI 


Words 1318 


60S 3 S 17 


38 


85 




54 Type* 


0.49 1,32 2,80 


6, 25 


13. 98 













To have a complete picture of relative frequency of 
these very frequent words in these six tents, we need to 
look at the number of different words (types) it takes to 

Page 17 

-■-'Cj, " v ...".*..' ' •■' . ■■. : • 



Word Frequency 

account for cumulative percents of the running words 
(tokens). This information is indicated in Table Two- It 
takes only from i to 3 words in any of these sin texts to 
account for 10% of the tokens. The most common word, THE, 
accounts for between 3, 9%<S53> and f, 9%<559) of all tokens. 

To account for £0% of the tokens takes only S to 8 
different words. That* s from . 51% to 1. 71% of the types. It 
takes 11-17 types to account for 30% of the tokens. This is 
only 1.2 to 3,4% of the types. The latter figure is for the 
fifth grade text (551) # To account for 4©% of the running 
words takes only S. 8 - 4% of the types except for 551 <6. 7%) 
and the essay, SSI, <S, 3%) , To understand what this means 
consider that for B7 words in BSC to account for 40% of the* 
total of 4£06 words each of these £7 words occurs an aver- 
age of 63 times. 

Half of the tokens in each tent are represented by 6, S 
to 8. 1% of the types except for 551 which requires 11,4% 
and SB I requiring 14%, Clearly these two selections are 
getting into lower frequency words than the other four. 
Each of the 58 words that account for the first 50% of S60 
occurs an average of 36 times. Each of the 53 words that 
account for the first 50% of 851 occurs an average of IS, 9 
times, find for 561 each occurs only 7,8 times. 

What all this illustrates is the extreme variability 
of word frequency within a text. More than half the indi- 
vidual words (types) occur only once in any of these six 
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tents while small numbers of types account for huge pro- 
portions of the total running words (tokens). 

So far, however we have not looked carefully at which 
words appear so frequently and what tent characteristics 
might account for their frequency, Figures la and lb show 
the £5 most common words in each story (in four cases we 
include more due to matching frequencies)* 

The words on the list of the £5 most frequent words in 
each story represent from 55 to 39£ of the running words of 
each tent, 
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^ Figure las Most Frequent Words in Frequency Order 
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Fig ure lbs Most Freq went Words i n Freq ueney Order 
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□f these eight wards all are function words. IT and 
THAT can also function as pronouns. Even THE f the most 
common word in all 6 texts ranges from 3. 3% to 9- B* of the 
running words in each text. This illustrates a key feature 
of word frequency in connected tentsi VARIABILITY WITHIN 
CONSTRAINT. The ' language requires the use of THE but it 
permits sufficient variation to al low considerable range. 

The words comprising these most common words may be 
divided for purpose of analysis into these main kinds i 

1. Function words 

£. Copula 

3* Pronouns 

4. Content words. 
Function words include: 

determiners (the, a), 

verb markers (was, had, were, will, are, is, can), 
conjoiners (and, as, that, but, when), 

prepositions and particles (to, in, of, with, at, 

for, into, from* on, up, out, ) 

others (it, there, not) 
One simple reason for the frequency of many function 
words is that, while the grammar of the language requires 
their functions, there are only a few words in the language 
which can fulfill each function. Only a few words can be 
determiners. There are few conjunctions and other 
conjoining elements in the language. There are me*e 
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prepositions but they still represent a finite set of 
words. Furthermore, while the language adds to its store 
of content words it does not add to its store of function 
words. Yet they are the binding material which makes the 
language cohesive and coherent. 

To illustrate this, Table 3 shows the percentage of 
each type of function word in each of the sin tents. 



Table 3 Percent of Tot al Running Words 



Function Word Ty 


pe gsi 


853 


S70 
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see 


S61 
















Noun Marker 


a, 7 


7. 7 


12, S 


IS. 3 


a. 2 
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3. 5 
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2, fi 
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a.g 
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1. 7 


1.6 
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2, S 
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3- 3 
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. 7 


. 4 
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.7 
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1. 1 


1.3 


1.2 


1.6 


1.2 


1.5 


Other 


,4 


l.£ 


,3 


. 5 


.7 


J .7 
















Total 


3S. 7 


35. 1 


37.6 


38. 7 


36, 4 


3a. 9 



From 32- 1 to 3S B of each text's running words are - 

functions words. The terms we use here to describe the 
various functions are those of C.C Fries* We prefer them 
for this purpose because of their descriptive reference to 
what they do. (Fries* 1S5S) 

The noun markers are few, mostly THE and ft <3N) but ! ; 

they represent f rom 7« 3 to IS- 8-/ of tht running wordi, The 
phrase markers < prepos it ions) are more common but still re- 
present a small set of words. These words also serve # (ii 
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verb particles- Contrast "he ran up the street with "he 
ran up the flag. 11 The former is a phrase marker, marking a 
prepositional phrase. The latter is a verb particle, part 
of the verb f RAN UP. In theme combined functions, this set 
of words represents from 1@ to 14% of the running words in 
each tent. 

There is substantial variation from tent to tent in 
use of conjunct ions f S6@ uses two and a half times as many 
as 851. But together with clause markers, which introduce 
subordinate clauses, conjunctions account for 5, i% to B*2% 
o* each text * s running words. Again, a very small set of 
words in the language carries a big part of the running 
text. 

The words which serve as copula are the BE forms, BE, 
WAS, WERE, IS, ARE show among the most common in these sin 
tents. These words also can serve as VERB MARKERS. Which BE 
forms appear as COPULAS or VERB MARKERS depends very much 
on the prevailing tenses in the tent which in turn is de- 
termined by whether the tent is about the past, present or 
future. So 551 arid S7® show only WAS and HAD among their 
most common words* S53 and 560 list just WAS. 559 has WAS, 
HAD and WERE. But the essay 561 shows WILL, HAVE, IS 9 BE, 
ARE among its most common words. 

Pronouns are clearly common among the most frequent 
words in each text. That's because the language requires 
the use of pronouns for recurrent nouns. IT is common in 
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all our tents, but which other pronouns are used depends on 
eharacterist ies of the tent. This is well illustrated in 
859, Sheep Dog* The central character is a female dog, 
Peggy* SHE and HER occur 105 times each and tie for fourth 
and fifth most common word in the tent, HE and HIS occur, 
but only £@ times each. 

S53, My Brother is a Genius, has predominately male 
characters and is told in the first person. So among its 
most common words ares I, HE, YOU, MY, HIS, STB also has 
male characters but is told in third person so its common 
words include these pronouns! HIS, HE, HIM, SSI has both 
male and female main characters and quite a bit of dia- 
logue. Its common words include: HE f I, YOU, SHE, HIS. 
The essay SSI uses a great deal of first person plural to 
represent a generalized society: "When we..." So it's not 
surprising that these pronouns are among the most common 
words i WE, OURS, US, THEY* 1, THEIR, WHO, ONE. 

To sum up, pronouns are important cohesive elements 
but which ones are common in any text depends on text 
characteristics such as cast of characters, dialogue, and 
whether it's first person or third person narration. 

Possessive pronouns are actually the most common noun 
modifiers. In fact function words acting as "pro" elements 
can take the place of any of the content words, not just 
nouns. Verb phrases may be replaced by verb markers: "Will 
you get it? Yes, I will. 11 fldverbials may be replaced by 
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prtpDsitions: "He walked in and looked around, 11 

fill these tent characteristics explain the frequency 
of function words. But they also explain the surprising 
i nf req ueney of content words. 

Nouns are the only content words to appear in any 
number among the lists of most frequent words in Figures la 
and lb. Here are the nouns that appear i 

351- FREDDIE* ELIZABETH* MILLER f UNCLE f MOTHER* 
FATHER. 

S53- BARNABY, BABY ? ANDREW. 

S7@^ MAKO* CANOE, BOY f TUPA f AFA f WATER, 

S59- PE3GY, SHEEP, COYOTE f COYOTES* BAND, 

SS@- HARRY, GANDBRBAI. 

SSI- CHILDREN* 

It's not surprising that in each of the narrative 
tents the most common noun is the name of one of the cha- 
racters* In three of them it* s the principal character but 
in S53 and in S6(3 it is not the main character because 
these are first person stories. In fact* in 853 the main 
character is never named. What is more surprising is that 
the most common nouns in these stories are not necessarily 
common in the language. Only BABY. BQY 5 and CHILDREN could 
be considered truly common. And some real ly uncommon nouns 
appear among these most frequent wordsi CANOE (to cross 
the lagoon) f SHEEP f BAND (the group of sheep) and COYOTE 
and COYOTES (pair of adversaries) m 
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The essay, S61 had only one noun, CHILDREN, among its 
most common words. Only three nouns occurred mere than 
four times in the entire texti CHILDREN (9 times), GENER- 
ATION <6), and ABE (5) . It is apparently possible to write 
an essay without using the same nouns very often, particu^ 
larly since there are main ideas but no main characters. 

What about other content words? 851 has a verb modi- 
fier, THEN, a kind of noun modifier, MRS, f and the verb 
SAID. UNCLE actually appears fairly often as a noun modi- 
fier, Mrs, Miller keeps telling Freddie he's "just like 
UNCLE. - - " - 853 has SAID, a verb and TYPICAL, a noun modi- 
fier, The story centers around whether Andrew is a typical 
baby, HAD, used as a verb, is the only non-noun content 
word among the most frequent list in S70 and 559. THERE 
and NOW as verb modifiers are among the most common words 
in S6@ afW SAID is the only verb* HAVE, sometimes a verb 
is the only non— noun content word on the S61 list. 

Only five verbs in S51 occur five times or more in the 
entire text- These are SAID, THOUGHT, GET, KNEW, and CAL- 
LED. In 553 the five most common verbs (Sin times or more) 
are SAID, THINK, SEE, KNOW, and GO- The five most common 
verbs in S7@ (occurring five times or more) are SAW, COME, 
LEAPED, HEARD, and ROSE, The contrast between this more 
active set of verbs and those in SSI and S53 also shows in 
359, The most common verbs in that are TURNED, SAW, LEAP- 
ED, LOOKED, MADE (6 times or more). 56©, with much tension 
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but little act i on has these five most common verbs (13 
times or more) : SAID, WENT, MOVE, LOOKED, STOOD, Theme 
verbs occur three times or more in SSi i FIND, SUSPECT, 
KNOW, BECOME, DO, JOIN, SEEN. 

While these verbs provide interesting insight into the 
content of each tent they show also that few verbs are 
frequent across tents and few verbs are frequent with in 
tents. SAID, of course, will be common where there is di- 
alogue. 

Few verb modifiers occur with any great frequency in 
any of the tents. THEN is relatively frequent in all tents 
except SSI. THERE, sometimes a verb modifier, is also 
found with moderate frequency in most but not all of the 
tents. NOW is found several times in three of the tents. 
Beyond that, the verb modifiers that occur more than two or 
three times are specific to the tent. The five most common 
verb modifiers in S7@ involving the killing of a shark areV 
THEN, AWAY, AGAIN, BEFORE, QUICKLY. B59 with the fighting 
of the dog and coyotes has a similar lists THEN, AGAIN, 
SLOWLY, FORWARD, CAREFULLY. And the very suspenseful SB® 
showss SLOWLY, AGAIN* CAREFULLY, QUICKLY, SHARPLY. 

Noun modifiers other than possessive pronouns are even 
more varied. Few occur more than five times even in the 
longer tents. Not all of the more common noun modifiers 
are adjectives. In S59 COYOTE and SHEEP are used five or 
more times as noun adjuncts. BEDDING, verb derived, occurs 
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five times (THE BEDDING SHEEP) B Again the lists of more 
common noun modifiers show their part ieularness to each 
tent: SSI shows Freddie's problem experiments! DARK, SMALL, 
BAD f PROUD, QUEER, 870' s list shows the shark fight theme: 
GREAT, WHITE, OLD, GREEN, DEAD, SSI has HUMAN, POLITICAL, 
VIETNAM, and GOLD (that's the Generation Gap). 

Table 4 Percent of Running Words in each Grammatical Category 

Grammatical Category 5^1 

Pronouns* 9.3 

Other Nouns £1.5 

Total Nouns 30* 8 

Verbs 17. 6 

Noun Modifiers* i@,£ 

Verb Modifiers 4.6 

Function Words 32.7 

Indeterminate 0 

Contractions £. 3 . 

♦Possessive pronouns are included as noun modifiers. 

To put this information about the relative frequency 
of different grammatical categories of content words into 
perspective, Table 4 presents the distribution of each ca- 
tegory in each entire text. 

It's interesting to note that the total percent of 
noun positions in these six texts only varies from £7. 954 to 
30. B% m Yet the texts vary considerably in what part of 
those noun positions are filled by pronouns, from 4- 3% to 
11.8%. The two first person stories, SS3 and S60 have si- 
milar high pronoun percent s, 11.6 and ll s 8 respectively. 
These two stories have sharply lower percent s of other 
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TOuriSt The rest of the variation in use of pronouns and 
nouns seems to reflect amount of dialogue and other sty^ 
listic factors. English clauses and sentences require 
nouns as subjects* direct and indirect objects, objects of 
preposit ions, etc. The proport ion, at least in these 
tents, seems to vary little. But other factors, some of 
which the author may control, appear to decide how many 
nouns are replaced by pronouns- 
Verbs show less variation, from 15.3 to IS. 4H. 37© 
and S59, the two tents with the lowest rate of verbs, have 
little dialogue because of tent factors. In 359 there are 
no human characters in much of the story. In 87© a con- 
siderable part of the story involves only Mako, a boy, his 
dog, Bf a, and Tupa, a great white shark. So, whereas SAID 
occurs 51 times in S53 and ties for fifth most common word, 
it occurs only five times in 359 and twice in S70. Re- 
presentation of oral dialogue in written text requires a 
special grammar which includes an- extra clause representing 
at least the speaker and some representation of the verb 
SAID. 

The amount of dialogue present also seems to explain 
the variation in the relative amounts of contractions in 
each tent since most of the contractions appear in dia- 
logue. 853 with th^ most dialogue has 4. B% contractions, 
S70, 859, and the e ay S61 with little or no dialogue have 
only -6H contractions each. 
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There is a very specific textual reison far the per- 
cent of words with indeterminate grammatical function in 
553. The central plot of the story is about an 8-month old 
baby learning to say big words by listening to his older 
brother read words from the dictionary. So words like 
PHILOSOPHICAL and INTELLECTUAL occur as word names out of 
syntactic content and are classified as indeterminate. 
That adds up to 7% of the running words of the tent, in 
contrast to negligible proportions for the other tents. 

Noun modifiers and verb modifiers vary moderately in 
proportion from text to tent, apparently for stylistic 
reasons. Possessive pronouns, included in noun modifiers, 
range from 2. B to 3, 5H of each tent. The grammar of Eng- 
lish requires neither noun modifiers nor verb modifiers to 
produce grammatical sentences. The meaning the author is 
representing may require a good deal of describing and 
qualifying but how much is clearly a function of the 
author*© purpose and style- S6(3, contains a lot of terse 
dialogue. One central character, Harry Pope, thinks he has 
a poisonous snake resting on his abdomen so he's minimizing 
his speech and movements in order to avoid startling the 
snake-- This leads to fewer noun modifiers. In the es- 
say, SSI, there are more noun modifiers because the author 
uses a lot of embedding transformations to produce long, 
complex clauses and sentences. He also uses more adverbial 
clauses than adverbs. So he has a higher proportion of 

Page 31 

o 

ERIC 



Word Fr eq ueney 

noun modifiers and a lower proportion of verb modifiers* 
His tent is at the high end in use of function words, which 
also reflects its syntactic complexity. Table 3 shows this 
tent has the highest percents of clause markers and verb 
markers among the function words. 

To summarize this discussion of the distribution of 
grammatical categories in these tents we can make the fol— 
1 on i ng st at ernents- 

The syntax requires some proportional distribution of 
these grammatical categories within the tents but other 
tent characteristics including semantic structure of the 

story and the author's purpose and style produce some vir- 

* 

iat ions among the texts in these proportions. Some very 
common grammatical functions can only be filled by a rela- 
tively small set of words, so these words are likely to be 
common in any text- Funct ion words and pronouns (including 
possessive pronouns) are the principal examples. 

On the other hand the categories of content words, 
nouns, verbs, noun modifiers, and verb modifiers, are much 
larger classes of words often called "open" classes because 
the language is continually adding to them. Still, the 
characteristics of particular texts exercise some con- 
straint on the choice of words to fill these grammatical 
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slots. Particularly proper nouns, the names of characters 
in the story, are likely to be among the most frequent 
words. There is a similar but more moderate influence on 
verb frequency. Narratives with lots of action will select 
verbs of movement whi le suspensef ul tents will choose 
another sett. Still, SAID is the only verb likely to become 
very frequent. 

In the case of all content words there is a counter 
pressure to the factors causing some words to occur more 
frequently than others. That's the rhetorical value that 
authors in the English language place on using varied terms 
and alternate ways of representing the same referents. We 
don't like to keep using the same nouns, verbs, adjectives 
or adverbs over and over and we f ll even avoid using the 
same sentence patterns repeatedly. 
MULTIPLE MEANINGS 

Large criticized word lists for their failure to ac- 
count for multiple meanings of words. This criticism does 
appear to be a major shortcoming, especially when you 
consider that the many meanings of even a common word such 
as RUN fill a dictionary page. However, within the confines 
of the single tents we examined multiple meanings for par- 
ticular words seldom occur. In fact, after enamining our 
sin texts, we were able to find only one word, ALLOWANCE in 
S51 that has two clearly different meanings in the story 
itself. In one instance, Mrs, Mi 1 ler, chiding Freddy for 
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ruining his sister's doll 5ays, " I want you to save half of 
your ALLOWANCE for it each week, " In the other, after 
Freddy has used his scientific ingenuity to free his sister 
from a dark closet, Mrs. Miller says proudly, "After this 
we must make some ALLOWANCE for experiments that do not 
turn out so nbIL n 

Our finding is somewhat surprising in light of the 
fact that these stories do make use of controlled vocabul- 
ary. Authors of controlled vocabulary tents often use 
words over and over again without regard for a possible 
change in meaning. 

While the multiple meanings of a given word may not 
occur in a single text, nevertheless the meaning of a word 
in a particular text may not be a common one and the reader 
may be unfamiliar with the unusual meaning. In 559, for 
instance, the author repeatedly refers to a BAND of sheep. 
Called upon to define BAND out of context, you might think 
of "band of gold, " "rubber band, " "brass band, " and so 
forth, before naming BAND as a term for a group of animals, 
Likewise, you might be hard pressed to come up with the 
meanings out of context for AIR and LIVE that appear in E53 
in relation to television. Mr, Barnaby bemoans the fact 
that in five minutes they are going "on the air, " "with a 
live show," In SB© DRAW and variations DREW and DRAWING 
appear four times but never in the way one would probably 
think of first, to DRAW a picture. Rather, we find the 
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fol lowing examples! 

1* u He. . . DREW his breath sharply through his teeth." 

2* "...he stuck the needle through the rubber top of the 

bottle and began DRAWING a pale yellow liquid up into the 

syringe by pulling out the plunger," 

3- "Shall we DRAW the sheet back quick. . . ?" 

4. "Slowly he DREW out the rubber tube from under thm 
sheet, " 

Within a given tent, an author may use words in un- 
usual ways either frequently (BAND of sheep in S53) or in- 
frequently (BODY of the island in S7@) . But the meaning of 
any word is always derived from the content in which it is 
embedded. 

Almost any word can be used metaphorically. The 
authors of our six stories employ metaphor to greater or 
lesser extents. The metaphorical uses of common words, BODY 
(of the island), FADES (of the cliffs) , ARMS (of the is- 
land) in the opening passage of 570 are descriptively po- 
werful but textually unpredictable, S53 begins with a 
string of vivid metaphors: 

The rays of the setting sun lingered over the 
high Arizona desert, touching the rocky tip of Badger 
Mountain and tinting the bold face of Antelope Rim. 
What is clear is that the particular meaning of a word 
in a text, whether literal or metaphoric may not be pre- 
dicted from the word* s general frequency. Common words may 
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be used in quite uncommon ways. 
Tent Cohesion 

The wording of a text is strongly influenced by the 
need for the tent to be syntactically and semantical ly c©- 
hesive, that is to have a unifying structure. 

The information we have presented so far shows that 
syntactic cohesion requires some proportionate distribution 
of grammatical functions and that some words will be common 
simply because there are few words to fill very common 
syntactic functions. Determiners, prepositions, pronouns 
are some examples. 

We've also seen some evidence of the influence that 
maintaining semantic cohesion has on tent wording and word 
frequency. But this is more compleH as it relates to 
choice of content words, synonyms, and "pro" elements. 

We can illustrate semantic cohesion by looking at ap- 
proximately 8® opening lines of each tent. Each author 
needs to accomplish a good deal in these opening lines to 
set up a cohesive tent and create a semantic structure, 

551 starts with a laments "Poor Freddie was in trouble 
again". The author, in the opening as lines, focuses on 
creating Freddie's character, his experimenting and the 
constant trouble this gets him into. Freddie* s fami ly is 
also introduced and a sub-theme, his mother' s comparing 
Freddie with his Swiss uncles is also established. 

In these opening lines we find these cohesive chains 
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(frequency in parentheses) : 

FREDDIE (30) : FREDDIE C4> f HE<5>, HI8<5), FREDDIE'S, YOU'VE, 

I<4>, YOU <4> , YOUR (2), HIM<S>, TINKER (2) 
TROUBLE <7) : TROUBLE, TURNED SREEN, POOR, WRECKED, QUEER, 

BAD, SADLY B 

CHEMISTRY (4) : CHEMISTRY SET, EXPERIMENT, MIXTURE, CHEMICALS 
ELIZABETH <5) i ELIZABETH <£) , LITTLE, SISTER, HEARTBROKEN 
MOTHER < im : MOTHER <£>, SHE<3), I<2>, MRS. MILLER <£) , ANGRY 
UNCLES<7)i UNCLE AUGUST, UNCLESCS), SWITZERLAND, ONE, THEM, 
LIKE (FREDDIE) 

There are 3@ references to Freddie that use 1@ dif- 
ferent words in these as opening lines. The abundance of 
dialogue in S51 results in in interesting patterns Freddie 
is referred to by name, nickname and by first, second, and 
third person pronouns* The semantic cohesion in SSI 
results in some words being repeated while at the same time 
the author achieves variety by using alternatives and 
related terms. 

353 opens with a statement of the problems 

"If it bothers you to think of it as baby sitting, 11 my 
father said, "then don't think of it as babysitting. Think 
of it as homework, • '■•»" 

In the opening ££ lines the author creates the 
problem. ft school age boy does his homework while caring 
for his baby brother, The older brother, who is the un- 
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named ^9,, and the baby brother mrm established im 
the t©sk- 

NARRATOR (£2) 1 Y0U<2>, MY<4>, YOUR (3), 1(11), FELLOW, ME 
HIS MOOD (6) i FOOLISH, ASHAMED, YELLED (2), SHOUTED, STAY 
BABY (12) V BABY (4) , BROTHER (2), YOU (2), ANDREW (2) , ANDREW'S, 
HIM 

BABY* B CHARACTER (10) g BILLY, SOUNDS, CRY (3) , DISTURB, 

FAULT, SLEEPING, WANT, TRIED, HOLD 
BABYSITTING <S) 1 BOTHERS, IT<B) , BABY SITTING (S) , DISTURB, 

STAY, HOME 

HOMEWORK (19) 1 HOMEWORK, PART, EDUCATION (2), STUDYING, 
DICTIONARY, WORD (3), PHILOSOPHICAL (3) , STUDY (2) , 
MEANINGS (3) , DEFINITIONS (2) 
The main character here is referred to 22 times, 
almost all in first and second person, requiring only 6 
words and no name. HOMEWORK, a Hmy event throughput the 
story, has 20 references and 10 different words in these 22 
opening lines* 

870 be B i„. by establishing the setting! "The island of 
Bora Bora, where Mako lived, i. far away in the South Pac- 
ifiC "' &Uthor Bonc ^^ ^ the setting and on Mako, ' 

his young hero in the opening 24 lines. There are these 
cohfsivi ehainsi 

SETTING^) S ISLAND <3), BORA BORA, SOUTH PACIFIC, IT<3), 
MAIN BODY 

ISLfiND CHARACTERISTICS ( 13) \ FAR AWAY, RISES, HIGH, (LIKE) 
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CASTLE, WATERFALLS, FACES, CLIFFS, UPWARD, CRAG <£) , 
EDGE, ARMS, REEF 
WATER <fi) : SOUTH PACIFIC, 5EA<2), WATER, SURF, LAQDO 
MAK0(13) : MAK0(4), HIS<4), HE<S), THEY, COMPANIONS, TWO 
AFA: AFA (6) 

MAKO* S CHARACTER (6)1 CLEVER, MADE, SPENT, BORN, HANDS, 
HEIGHT 

HARPOON <6)s HARPOON, STRAIGHT, ARROW, TIPPED, SPEARS, 
POINTED 

CANOE < IS) I CANOE (S) , LARGER, OUTRIGGER, SIDE, BOAT, 
TIPPING, LARGE, HOLD, HOLLOWING, TREE, LONGER 
Thirteen references to Make, single or with his dog 
Afa require six words. Half the references use pronouns. 
The characteristics of Mako and the island use many refer- 
ences with only one word, CRAG, used twice, 

a59 also begins with the settings "The rays of the 
setting sun lingered over the high Arizona desert, touching 
the rocky tip of Badger Mountain and tinting the bold face 
of Antelope Rim, " In the first £5 lines, the author 
creates both mood and setting while introducing sheep dog 
at work. We find these cohesive chains: 

EVENING (10) I RAYS, SUN, TINTING, SETTING, DARKNESS, BEDDING 

DOWN, NIGHT, DROWSINESS, DARK, 
PL ACE < 10) i DESERT, ROCKY TIP, BADGER MOUNTAIN, BOLD FACE, 

ANTELOPE RIM, BASIN, SALT CREEK WASH, POOL, PATCH - 
SHEEP < 14) a BAND < 3) , SHEEP < 3) , 800, LAMB <S) <£) , BLEATING, 
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MASS, EWE, HER, FAR-SIDE 
DOG(S) <7) : D0Q<SM3> f TWO, EARS, HER f MATE 
PEGGY'S CHARACTER <S) : PATROLLING, URGED, ALERT, LARGER, 

TURNED, ASSURED 

These chains use many words once. Only DOG (8), BAND, 
SHEEP, LAMB(S) occur more than once, The main character, a 
sheep dog, is not named until line 27 of the story. 

The author devotes the next £7 lines to creating Peggy 
as the central character. In that sequence this chain oc- 
curs i 

D0G(S)t3), PEGGY<2>, SHE <G> , HER <B> , BREED, COLLIES, COAT, 
HEAD, EYES (S) , DESCENDANT, FOREPAW, TOES, FOOT. There are 
£9 references then to PEGGY, more than one per line, yet 
even after her name is introduced the author only uses it 
twice in this sequence, using 14 pronouns instead. 

SS® begins with creating the mood and establishing two 
ma in char act ers in t he set ting. 

"It must have been around midnight when I drove 
home...," There are these chains in the 26 lintii 
TIMBER WOOD (Narrator) <16> s I (12) , ME <2> f TIMBER <2) 
HARRY POPE < 11) : HARRY POPE, HIS <2) 9 HE (5) , HE* D, HARRY* S, 

HIM 

SETTING <2@> s HOME (S) f GATES* BUNGALOW, WINDOW, SIDE BED- 
ROOM, DRIVE, STEP <S) <2> , BALCONY <2) , ONE, TOP, 
DOOR(S) <2) , HOUSE, ITSELF, HALL, ROOM, IT 
DARK<9> s MIDNIGHT* SWITCHED OFF, BEAM, SWING IN, LIGHT (£), 
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STILL 0N f DARK, SWITCHED ON 
SLEEP<9)i WAKE, AWAKE<£), DROPPED OFF, QUIETLY, LYING, BED, 
MORE, TURN 

ATTITUDE <6)i APPROACHED, BOTHERED, NOTICED, CAREFULLY, 

QUIETLY, LOOKED IN 
MOVEMENT < 1 1 ) i DROVE, APPROACHED, OPENED, COMING, PARKED, 

WENT UP, TAKE, GOT TO, CROSSED, PUSHED THROUGH, WENT 

ACROSS- 

Three words are enough to represent Harry Pope 16 
times. But 11 different verbs are .used. to show movement of 
the main character with no one verb used twice. This shows 
again the text characterist ics that make words both common 
and diverse in a connected tent- 
SSI starts with establishing two age groups, "Recent- 
ly, 1 spoke with a man twice my age who expressed great 
faith in the future of American youthi" 

Cohesion chains advance the two groups and set a tone 
for the essay. 

YOUTH ( 16) s YOUTH <S) <S) , THEM, YEARS, YQUNQ<£), 

TROUBLEMAKERS, WRONG, I, MY, AMERICAN, THEY <S) , AGE, 
MILLIONS* SONS, 

ELDERS (6) s MAN, TWICE, HE, MATURITY, FATHERS, CYNICISM 
ARGUMENT <8)i SPOKE, EXPRESSED, ENVISIONS, THINKING (WISH- 
FUL), WANT, ACCEPT* CYNICISM, PRE-CONCEPTION 
THE GAP (4) s DIVIDED, GAP (GENERATION) , FUTURE (2) 
ACTIONS <5) i MARCHING, FIGHTING, RISK, DROPPING, SHAVING 
" " \ " •- Page '41 v,../^;^'—:-; / • ^ , .. . 
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SYMBOLS OF MATURITY <7> : SHAVING, DROPPING (HEMS), 

fiCCLILTURfiTINB, FAMILY, MORTGAGE, PAYMENTS f YES 
Each of these opening sequences i 1 lust rates how the 
need for semantic cohesion limits the writer's choices but 
how writers achieve such cohesion while also achieving 
stylistic diversity and a richness of wording. By produc- 
ing a mix of function words, pronouns and varied content 
words, the author builds a cohesive tent and builds liter- 
ary style at the same time. 

In each tent, the author achieves the semantic and 
pragmatic purposes of the opening lines by staying within 
text constraints while still making use of the rich lan- 
guage resources. In 86© few different words are needed to 
refer to the two characters but 14 terms establish the 
house and 11 different verbs impel the reader into the 
story as Timber progresses to Harry's room. 

The author can, to some extent, choose to use fewer 
terms, more common terms, or less varied terms. But 
authors seem to be aware that, as content builds, variety 
adds depth to the comprehensibi 1 ity without making the 
whole less comprehensible, ;^;^/' 

So in SSI, when the author uses BABY FOOD, WEED 
KILLER, and CONVERTIBLE DEBENTURES as examples of how youth 
will be accult urated f he knows that his readers can get his 
point without exactly knowing what a debenture is. In fact 
the term may have been chosen del i berately to sound tech* 
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nical and baring. 

Armstrong Sperry in S70 prefers variety over 
repetition in representing the CANOE and ISLAND and does 
not avoid unfamiliar terms such as CRAG, REEF, PANBANUS, 
SURF, LAGOON, OUTRIGGER when they seem appropriate. His 
purpose is to create a sense of setting, not to teach an 
island vocabulary. But, in fact, through he use of 
synonyms and related terms in cohesive chains, the author 
creates a context which makes it possible for readers to 
infer meaning and build vocabulary. 
CONCLUSION 

Our study of the wording of tents or, if you prefer, 
word frequency as a tent characteristic has demonstrated 
that general word frequency lists can at best tell only 
part of the story* 

If words are frequent across tents it's because the 
language requires them to be. But such words that are very 
frequent in all tents are very few in total number and al- 
most all are function words, be forms, and pronouns. 

In the word frequency lists some content words will be 
found considerably higher than others. That f s because they 
are used in common ways to refer to common concepts and 
©Hperiences. But in particular coherent, cohesive tents, 
which content words are common depends on the content of 
the tent- In narratives the common content words usual ly 
involve char act ers* names, some other important nouns and a 
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few others. But authors avoid using content words 
repetit iously for stylistic reasons. So cohesive chains 
art built of common pronouns, key content words, and a 
varied set of terms all somehow semant ical ly related in the 
context of the story. 

The wording of any tent is thus by no means random. 
In fact tents self-control their vocabulary. For readers 
that means they will build dependable strategies for 
dealing with words common in the tent but less common in 
the language as a whole, They will also build strategies 
for knowing the relative importance to tent comprehension 
of particular words and t^fms. And of course they will 
build strategies for expanding their vocabularies through 
the reading of naturally worded texts. 

Authors and editors would do better to focus on 
relating the content of texts to the audience than to focus 
on controlling vocabulary through use of word lists. ft 
sense of audience and use of the natural constraints of the 
language will result in text wordings which are in keeping 
with the backgrounds of the intended readers and the stra- 
tegies readers develop. 

Teachers concerned about vocabulary development would 
do better to focus on functional use of words and terms in 
the context of real texts than to resort to decontext ual— 
ized lists or dictionary exercises, text, after all, is 
considerably more ; than the sum of its words, 
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The Pro gram in Language and Literacy is an innovative effort to provide a center for a 
variety of activities dedicated to better knowledge of development in language and 
literacy and more effective school practice. The program is concerned with language 
processes as well as learning and teaching of language. 

Activities of the program have several main concentrations* 

Research on oral and written language — 

on development of oral and written language* 

on teaching for effective use of oral and written language. 

on curriculum for language growth and use. 

on bilingual, bicultural, biliterate language developments language instruction* 

on issues of adult basic literacy. 
Theory development in oral and written language processes. 
Acquisition and instruction of oral and written language processes* 
Development of curriculum and methodology for effective monolingual and 

bilingual school programs. 
Support for language and literacy components of p re-service teacher education 

..programs*. . i : - .., -.V 

In-service programs to help teachers, curriculum workers, and school administrators 

to achieve more effective programs in language and literacy* .; U 

Consultation to school systems and other agencies to plan and evaluate more 

effective programs in language and literacy. 'It- 
Graduate courses , seminars ? minors and combined majors in educational linguistics / f-t- 

to help educators become more effective as teachers, curriculum workers 

material developers and teacher educators. f , ij| 

Conferences, workshops, symposia to provide dialogue among researchers, disseminators 

and. practitioners . ■■■ >■. ■ * 

Publications including working papers, position papers and research reports* 

The program focuses on written language. Written language is a receptive and productive 
process in a literate society where people have the alternative of using oral language 
in face^to^face situations or written language over time and space. 

The program is cross-disciplinary. It draws on a wide variety of bases—sociology, 
sociolinguistics , psycholinguistics , and areas of psychology-so that we can understand 
the learning of language and cognition and see the relationship of thought and language. , 
We draw from other disciplines as well on neurology, physiology, and of course pedagogy*; 
the study of education itself* The Program in Language and Literacy is a program in ^ 
educational linguistics. 

Staff s />':: : , ; : . . '\ \. '. i ip 

Dr. Kenneth S. and Yetta M* Goodman, Co-directors f il' 

Faculty, Elementary Education, University of Arizona, Tucson ' 



• ^is { Bir4^:\Research -'Assistant , Series ^Editor 'V-'*' Sherry Vaughan, Research Assistant 

Suzanne Gespass, Research Assistant ,• Sandra Wilde, Research Assistant 

Myna Hausslar, Research Assistant Diana Ybarra, Secretary - 

Wendy Kasten, Research Assistant Dorothy McCormack, Secretary 
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Order Form 
Occasional Papers 
Program in Language and Literacy 
Arizona Center for Research and Development 
4<B£ Education, Bldg- 69, College of Education 
University of Arizona, Tucson, AZ 85721 

Cost *£ per copy No. of copies 

No, Is Goodman, K, S, and Goodman, Y. M, A WHOLE-LANGUAGE 

COMPREHENSION CENTERED VIEW OF REEDING DEVELOPMENT, 
A position paper, February, 1381, 

No, Goodman, K, 5, and Gel lasch, F,V, WORD OMISSIONS IN 

READINGS DELIBERATE AND NON-DELIBERATE, A research 
report, March, 1981, 

No- 3s Rltwerger, B. and Goodman, K. S. STUDYING TEXT 
DIFFICULTY THROUGH MISCUE ANALYSIS* A research 
report, June, 1981, 

No, 4 s Goodman^ Y- M, and Altwerger, B. A STUDY OF LITERACY 
IN PRESCHOOL CHILDREN, A research report, September, 
1981, 

No, 5s Milz, Vera E, YOUNG CHILDREN WRITE: THE BEGINNINGS, 
February, 1982, 

No, Ss Goodman, K, S, and Bird, L, B, THE WORDING OF TEXTS s 
INTRA-TEXT WORD FREQUENCY, A research report, 
April, 19S£ 

No, 7s Goodman, K. 5, and Gol lasch, F, V, RECONCEPTUALIZING 
INSERTIONS IN ORAL READING, A research report, in 
preparation. 

No, 8s Goodman^ K, S, and Gespass, 8, TEXT FEATURES AS THEY 
RELATE TO MISCUESi PRONOUNS, A research report, in 
preparation, ji 

No- 9s Goodman, K, S, and Gespass, S, TEXT FEATURES AS THEY 
RELATE TO MISCUESs DETERMINERS, THE and A/AN, A 
I research report, in preparation. 

No", l©s Goodman, K. S, and Gespass, S, TEXT FEATURES AS THEY 
RELATE TO MISCUESs DIALOGUE AND DIALOGUE CARRIERS, 
A research report, in preparation, 

Total amount <*£ each ) 

N,,I,E, Final Reports Reading of American Children Whose ;ff I 

Language is a Stable Rural Dialect of English or a Language 

Other Than English. Directors Kenneth 3, Goodman (1978) —-.JiL — £ ^ 

ry' ^ ...^p^'- ^ ^ ^j^.v^.r;, <*12 per copy) L f-f ,.- :,| "4 

;.f\ „.,-;. *1#00 handling per order* < ^ ; 

Please ma^e check payable to Program in Language and ' ; Literacy ::/''"-l i ;- f 

*Aetual shipping charges will be added for large orders or overseas '£ 

orders-\ „'-'-'' 
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